48 research outputs found
Evolutionary Algorithms to Discover Quantitative Association Rules
Premio Extraordinario de Doctorado U
Analysis of Measures of Quantitative Association Rules
This paper presents the analysis of relationships among different
interestingness measures of quality of association rules as first step
to select the best objectives in order to develop a multi-objective algorithm.
For this purpose, the discovering of association rules is based on
evolutionary techniques. Specifically, a genetic algorithm has been used
in order to mine quantitative association rules and determine the intervals
on the attributes without discretizing the data before. The algorithm
has been applied in real-word climatological datasets based on Ozone and
Earthquake data.Ministerio de Ciencia y Tecnología TIN2007-68084-C-00Junta de Andalucía P07-TIC-0261
Cis-cop: Multiobjective identification of cis-regulatory modules based on constrains
Gene expression regulation is an intricate,
dynamic phenomenon essential for all biolog ical functions. The necessary instructions for
gen expression are encoded in cis-regulatory
elements that work together and interact
with the RNA polymerase to confer specific
spatial and temporal patterns of transcrip tion. Therefore, the identification of these el ements is currently an active area of research
in computational analysis of regulatory se quences. However, the problem is difficult
since the combinatorial interactions between
the regulating factors can be very complex.
Here we present a web server, Cis-cop, that
identifies cis-regulatory modules given a set
of transcription factor binding sites and, ad ditionally, also RNA pol sites for a group of
genes
EVFUZZYSYSTEM: evolución de sistemas difusos para problemas de regresión multi-dimensionales
Este trabajo presenta EvFuzzySystem, un
método evolutivo que permite el diseño com pleto de sistemas de lógica difusa, generando
de forma simultánea funciones miembro y
conjunto de reglas apropiados. EvFuzzySys tem representa la extensión del método
diseñado inicialmente para la resolución de
problemas definidos por dos entradas y una
salida. Esta extensión no ha sido trivial desde
el punto de vista computacional. Los resulta dos muestran que puede ser aplicado a prob lemas de regresión compuestos de cualquier
número de entradas y que los resultados
obtenidos son comparables a los de métodos
ya existentesComisión Interministerial de Ciencia y Tecnología (CICYT) TIN2005-08386-C05-03Junta de Andalucía PC06-TIC-02025Universidad de Jaén UJA-08-16- 3
Analysis of the evolution of the Spanish labour market through unsupervised learning
Unemployment in Spain is one of the biggest concerns of its inhabitants. Its unemployment rate is the second highest in the European Union, and in the second quarter of 2018 there is a 15.2% unemployment rate, some 3.4 million unemployed. Construction is one of the activity sectors that have suffered the most from the economic crisis. In addition, the economic crisis affected in different ways to the labour market in terms of occupation level or location. The aim of this paper is to discover how the labour market is organised taking into account the jobs that workers get during two periods: 2011-2013, which corresponds to the economic crisis period, and 2014-2016, which was a period of economic recovery. The data used are official records of the Spanish administration corresponding to 1.9 and 2.4 million job placements, respectively. The labour market was analysed by applying unsupervised machine learning techniques to obtain a clear and structured information on the employment generation process and the underlying labour mobility. We have applied two clustering methods with two different technologies, and the results indicate that there were some movements in the Spanish labour market which have changed the physiognomy of some of the jobs. The analysis reveals the changes in the labour market: the crisis forces greater geographical mobility and favours the subsequent emergence of new job sources. Nevertheless, there still exist some clusters that remain stable despite the crisis. We may conclude that we have achieved a characterisation of some important groups of workers in Spain. The methodology used, being supported by Big Data techniques, would serve to analyse any alternative job market.Ministerio de Economía y Competitividad TIN2014-55894-C2-R y TIN2017-88209-C2-2-R, CO2017-8678
An evolutionary algorithm to discover quantitative association rules in multidimensional time series
An evolutionary approach for finding existing
relationships among several variables of a multidimensional
time series is presented in this work. The proposed model to
discover these relationships is based on quantitative association
rules. This algorithm, called QARGA (Quantitative
Association Rules by Genetic Algorithm), uses a particular
codification of the individuals that allows solving two basic
problems. First, it does not perform a previous attribute
discretization and, second, it is not necessary to set which
variables belong to the antecedent or consequent. Therefore,
it may discover all underlying dependencies among
different variables. To evaluate the proposed algorithm
three experiments have been carried out. As initial step,
several public datasets have been analyzed with the purpose
of comparing with other existing evolutionary approaches.
Also, the algorithm has been applied to synthetic time series
(where the relationships are known) to analyze its potential
for discovering rules in time series. Finally, a real-world
multidimensional time series composed by several climatological
variables has been considered. All the results show
a remarkable performance of QARGA.Ministerio de Ciencia y Tecnología TIN2007- 68084-C02-02Junta de Andalucia P07-TIC- 0261
On the use of algorithms to discover motifs in DNA sequences
Many approaches are currently devoted to find
DNA motifs in nucleotide sequences. However, this task remains
challenging for specialists nowadays due to the difficulties
they find to deeply understand gene regulatory mechanisms,
especially when analyzing binding sites in DNA. These sites or
specific nucleotide sequences are known to be responsible for
transcription processes. Thus, this work aims at providing an
updated overview on strategies developed to discover meaningful
motifs in DNA-related sequences, and, in particular, their
attempts to find out relevant binding sites. From all existing
approaches, this work is focused on dictionary, ensemble, and
artificial intelligence-based algorithms since they represent the
classical and the leading ones, respectively.Ministerio de Ciencia y Tecnología TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611
Selecting the best measures to discover quantitative association rules
The majority of the existing techniques to mine association rules typically use the support and the confidence to
evaluate the quality of the rules obtained. However, these two measures may not be sufficient to properly assess
their quality due to some inherent drawbacks they present. A review of the literature reveals that there exist many
measures to evaluate the quality of the rules, but that the simultaneous optimization of all measures is complex and
might lead to poor results. In this work, a principal components analysis is applied to a set of measures that evaluate
quantitative association rules' quality. From this analysis, a reduced subset of measures has been selected to be
included in the fitness function in order to obtain better values for the whole set of quality measures, and not only
for those included in the fitness function. This is a general-purpose methodology and can, therefore, be applied to
the fitness function of any algorithm. To validate if better results are obtained when using the function fitness
composed of the subset of measures proposed here, the existing QARGA algorithm has been applied to a wide
variety of datasets. Finally, a comparative analysis of the results obtained by means of the application of QARGA
with the original fitness function is provided, showing a remarkable improvement when the new one is used.Ministerio de Ciencia y Tecnología TIN2011-28956-C0
Quantitative Association Rules Applied to Climatological Time Series Forecasting
This work presents the discovering of association rules based on evolutionary techniques in order to obtain relationships among correlated time series. For this purpose, a genetic algorithm has been proposed to determine the intervals that form the rules without discretizing the attributes and allowing the overlapping of the regions covered by the rules. In addition, the algorithm has been tested on real-world climatological time series such as temperature, wind and ozone and results are reported and compared to that of the well-known Apriori algorithm
Obtaining optimal quality measures for quantitative association rules
There exist several works in the literature in which fitness functions based on a combination of weighted measures for
the discovery of association rules have been proposed. Nevertheless, some differences in the measures used to assess
the quality of association rules could be obtained according to the values of the weights of the measures included in the
fitness function. Therefore, user's decision is very important in order to specify the weights of the measures involved in
the optimization process. This paper presents a study of well-known quality measures with regard to the weights of the
measures that appear in a fitness function. In particular, the fitness function of an existing evolutionary algorithm called
QARGA has been considered with the purpose of suggesting the values that should be assigned to the weights,
depending on the set of measures to be optimized. As initial step, several experiments have been carried out from 35
public datasets in order to show how the weights for confidence, support, amplitude and number of attributes
measures included in the fitness function have an influence on different quality measures according to several
minimum support thresholds. Second, statistical tests have been conducted for evaluating when the differences in
measures of the rules obtained by QARGA are significative, and thus, to provide the best weights to be considered
depending on the group of measures to be optimized. Finally, the results obtained when using the recommended
weights for two real-world applications related to ozone and earthquakes are reported.Ministerio de Ciencia y Tecnología TIN2011-28956-C02Junta de Andalucía P12- TIC-1728Universidad Pablo de Olavide APPB81309